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DISCUSSION OF: BROWNIAN DISTANCE COVARIANCE 



By Leslie Cope 
Johns Hopkins University 

I read Distance Covariance, by Drs. Szekely and Rizzo, with great in- 
terest. This is an elegant contribution to statistical theory; the three-way 
equivalence between a weighted expectation of the difference between Brow- 
nian covariance and two very different formulations of V 2 is very attractive, 
and together with the examples make a strong case for distance covariance. 

But like many statisticians, I spend much of my working life analyzing 
genomic data sets and so am interested in how distance covariance and 
correlation might be used in high dimensional data with relatively small 
sample sizes. In these applications it is often more important to characterize 
the relationships between genes than to formally test for independence. And 
the Pearson correlation coefficient, complimented by a well-developed and 
widely-used theory of linear models and matrix methods, is highly applicable 
on such data sets. The restriction to linear relationships between variables 
is arguably even an advantage; while Pearson's correlation may not capture 
all dependencies, we know a great deal about the interpretation of results 
from its application. 

It is, of course, not possible to settle the question here, but some pre- 
liminary thoughts follow on the potential utility of distance covariance, and 
particularly the scaled distance correlation, in this setting. 

Using the author's notation, if (X, Y) is a pair of random variables (vec- 
tors) and (X, Y) a sample drawn from the joint distribution, the dependence 
statistics A^i and Bki are centered, interpoint distance matrices for X and 
Y respectively, and V 2 (X,Y) is the mean product moment of the entries 
in these two matrices. Thus, the empirical distance covariance is a cross- 
variable covariance of within-variable interpoint distances, and the distance 
correlation is the same, appropriately scaled. In practice, this is similar to 
the correlation of correlations used by Lee et al. (2003) and Parmigiani et 
al. (2004) to quantify the reproducibility of results obtained on different 
microarray platforms or from independent gene expression studies, but is 
more general, since it can be applied even to two, scalar-valued random 
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variables, and because of its potential to capture nonlinear as well as linear 
dependencies. 

This representation of the distance correlation offers some intuition into 
the characteristics of the statistic. It is reflected in Theorem 4(iii), stating 
that R(X., Y) = 1 only if Y can be obtained from X by orthogonal transfor- 
mation, since these rigid transformations preserve interpoint distances up to 
a scaling factor. It explains the ability, demonstrated in the first few exam- 
ples, to capture nonmonotone relationships between two variables; samples 
with similar wavelength also have similar transmittance. It also helps to ex- 
plain why the method does not offer an advantage over Pearson correlation 
when the relationship between the variables is monotone if nonlinear, as in 
the example of Gumbel's bivariate exponential random variables. In the case 
of monotone dependence, there is much less difference between correlating 
interpoint distances and correlating the original variables. 

This representation also sheds light on one property of the empirical esti- 
mate of distance covariance that may be very important in the small sample, 
high dimensional setting typical of genomic studies. While shown to be con- 
sistent, it is not unbiased. For small and even moderate sample sizes, there 
can be a substantial bias, increasing with the dimensionality of the data. 
Suppose that (X,Y) is a small sample drawn from the joint distribution 
of X and Y. If i ^ j, then the Euclidean distance between Yj and Y,- is a 
random value with distribution depending on the variance of Y, and if i = j, 
then the distance is 0. Even after the centering step, the distribution of val- 
ues on the diagonal of each distance matrix is very different from that found 
off-diagonal, and contributes to inflated distance covariances and correla- 
tions. As the sample size increases, the influence of the diagonal decreases, 
and so this source of error vanishes in the limit. 

The same bias affects other potential applications of the method. Princi- 
ple components analysis has many applications in genomic data analysis and 
one might apply the same decomposition to a matrix of pairwise distance 
covariances or correlations. The consistent inflation of these quantities for 
every pair of variables puts significant load on a spurious component, de- 
pending on the variance of each variable. 

This does not present problems for a permutation test of independence 
based on this statistic, where the null distribution exhibits the same bias, 
and the lack of power that goes with it is not unexpected when the sample 
size is small. It may be that simply excluding the diagonal elements of the 
distance matrices from the final covariance calculation makes for a reason- 
ably unbiased, finite sample estimator, but for the version presented in this 
paper, this does complicate interpretation, and may invalidate parametric, 
asymptotic tests. 

I strongly suspect that the authors are right when they say, "In sum- 
mary, distance correlation is a valuable, practical, and natural tool in data 
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analysis and inference. . . ," but believe that potential has not yet been fully 
demonstrated, and look forward to further developments that may do so. 
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